Goto

Collaborating Authors

 theorem 5


Score Shocks: The Burgers Equation Structure of Diffusion Generative Models

Sarkar, Krisanu

arXiv.org Machine Learning

We analyze the score field of a diffusion generative model through a Burgers-type evolution law. For VE diffusion, the heat-evolved data density implies that the score obeys viscous Burgers in one dimension and the corresponding irrotational vector Burgers system in $\R^d$, giving a PDE view of \emph{speciation transitions} as the sharpening of inter-mode interfaces. For any binary decomposition of the noised density into two positive heat solutions, the score separates into a smooth background and a universal $\tanh$ interfacial term determined by the component log-ratio; near a regular binary mode boundary this yields a normal criterion for speciation. In symmetric binary Gaussian mixtures, the criterion recovers the critical diffusion time detected by the midpoint derivative of the score and agrees with the spectral criterion of Biroli, Bonnaire, de~Bortoli, and Mézard (2024). After subtracting the background drift, the inter-mode layer has a local Burgers $\tanh$ profile, which becomes global in the symmetric Gaussian case with width $σ_τ^2/a$. We also quantify exponential amplification of score errors across this layer, show that Burgers dynamics preserves irrotationality, and use a change of variables to reduce the VP-SDE to the VE case, yielding a closed-form VP speciation time. Gaussian-mixture formulas are verified to machine precision, and the local theorem is checked numerically on a quartic double-well.


Efficient machine unlearning with minimax optimality

Xie, Jingyi, Zhang, Linjun, Li, Sai

arXiv.org Machine Learning

There is a growing demand for efficient data removal to comply with regulations like the GDPR and to mitigate the influence of biased or corrupted data. This has motivated the field of machine unlearning, which aims to eliminate the influence of specific data subsets without the cost of full retraining. In this work, we propose a statistical framework for machine unlearning with generic loss functions and establish theoretical guarantees. For squared loss, especially, we develop Unlearning Least Squares (ULS) and establish its minimax optimality for estimating the model parameter of remaining data when only the pre-trained estimator, forget samples, and a small subsample of the remaining data are available. Our results reveal that the estimation error decomposes into an oracle term and an unlearning cost determined by the forget proportion and the forget model bias. We further establish asymptotically valid inference procedures without requiring full retraining. Numerical experiments and real-data applications demonstrate that the proposed method achieves performance close to retraining while requiring substantially less data access.


Inversion-Free Natural Gradient Descent on Riemannian Manifolds

Draca, Dario, Matsubara, Takuo, Tran, Minh-Ngoc

arXiv.org Machine Learning

The natural gradient method is widely used in statistical optimization, but its standard formulation assumes a Euclidean parameter space. This paper proposes an inversion-free stochastic natural gradient method for probability distributions whose parameters lie on a Riemannian manifold. The manifold setting offers several advantages: one can implicitly enforce parameter constraints such as positive definiteness and orthogonality, ensure parameters are identifiable, or guarantee regularity properties of the objective like geodesic convexity. Building on an intrinsic formulation of the Fisher information matrix (FIM) on a manifold, our method maintains an online approximation of the inverse FIM, which is efficiently updated at quadratic cost using score vectors sampled at successive iterates. In the Riemannian setting, these score vectors belong to different tangent spaces and must be combined using transport operations. We prove almost-sure convergence rates of $O(\log{s}/s^α)$ for the squared distance to the minimizer when the step size exponent $α>2/3$. We also establish almost-sure rates for the approximate FIM, which now accumulates transport-based errors. A limited-memory variant of the algorithm with sub-quadratic storage complexity is proposed. Finally, we demonstrate the effectiveness of our method relative to its Euclidean counterparts on variational Bayes with Gaussian approximations and normalizing flows.


Graph-Informed Adversarial Modeling: Infimal Subadditivity of Interpolative Divergences

Birmpa, Panagiota, Hall, Eric Joseph

arXiv.org Machine Learning

We study adversarial learning when the target distribution factorizes according to a known Bayesian network. For interpolative divergences, including $(f,Γ)$-divergences, we prove a new infimal subadditivity principle showing that, under suitable conditions, a global variational discrepancy is controlled by an average of family-level discrepancies aligned with the graph. In an additive regime, the surrogate is exact. This closes a theoretical gap in the literature; existing subadditivity results justify graph-informed adversarial learning for classical discrepancies, but not for interpolative divergences, where the usual factorization argument breaks down. In turn, we provide a justification for replacing a standard, graph-agnostic GAN with a monolithic discriminator by a graph-informed GAN (GiGAN) with localized family-level discriminators, without requiring the optimizer itself to factorize according to the graph. We also obtain parallel results for integral probability metrics and proximal optimal transport divergences, identify natural discriminator classes for which the theory applies, and present experiments showing improved stability and structural recovery relative to graph-agnostic baselines.


Information-Theoretic Limits of Safety Verification for Self-Improving Systems

Scrivens, Arsenios

arXiv.org Machine Learning

Can a safety gate permit unbounded beneficial self-modification while maintaining bounded cumulative risk? We formalize this question through dual conditions -- requiring sum delta_n < infinity (bounded risk) and sum TPR_n = infinity (unbounded utility) -- and establish a theory of their (in)compatibility. Classification impossibility (Theorem 1): For power-law risk schedules delta_n = O(n^{-p}) with p > 1, any classifier-based gate under overlapping safe/unsafe distributions satisfies TPR_n <= C_alpha * delta_n^beta via Holder's inequality, forcing sum TPR_n < infinity. This impossibility is exponent-optimal (Theorem 3). A second independent proof via the NP counting method (Theorem 4) yields a 13% tighter bound without Holder's inequality. Universal finite-horizon ceiling (Theorem 5): For any summable risk schedule, the exact maximum achievable classifier utility is U*(N, B) = N * TPR_NP(B/N), growing as exp(O(sqrt(log N))) -- subpolynomial. At N = 10^6 with budget B = 1.0, a classifier extracts at most U* ~ 87 versus a verifier's ~500,000. Verification escape (Theorem 2): A Lipschitz ball verifier achieves delta = 0 with TPR > 0, escaping the impossibility. Formal Lipschitz bounds for pre-LayerNorm transformers under LoRA enable LLM-scale verification. The separation is strict. We validate on GPT-2 (d_LoRA = 147,456): conditional delta = 0 with TPR = 0.352. Comprehensive empirical validation is in the companion paper [D2].


Conformal Selective Prediction with General Risk Control

Bai, Tian, Jin, Ying

arXiv.org Machine Learning

In deploying artificial intelligence (AI) models, selective prediction offers the option to abstain from making a prediction when uncertain about model quality. To fulfill its promise, it is crucial to enforce strict and precise error control over cases where the model is trusted. We propose Selective Conformal Risk control with E-values (SCoRE), a new framework for deriving such decisions for any trained model and any user-defined, bounded and continuously-valued risk. SCoRE offers two types of guarantees on the risk among ``positive'' cases in which the system opts to trust the model. Built upon conformal inference and hypothesis testing ideas, SCoRE first constructs a class of (generalized) e-values, which are non-negative random variables whose product with the unknown risk has expectation no greater than one. Such a property is ensured by data exchangeability without requiring any modeling assumptions. Passing these e-values on to hypothesis testing procedures, we yield the binary trust decisions with finite-sample error control. SCoRE avoids the need of uniform concentration, and can be readily extended to settings with distribution shifts. We evaluate the proposed methods with simulations and demonstrate their efficacy through applications to error management in drug discovery, health risk prediction, and large language models.


Instance-optimal stochastic convex optimization: Can we improve upon sample-average and robust stochastic approximation?

Jiang, Liwei, Pananjady, Ashwin

arXiv.org Machine Learning

We study the unconstrained minimization of a smooth and strongly convex population loss function under a stochastic oracle that introduces both additive and multiplicative noise; this is a canonical and widely-studied setting that arises across operations research, signal processing, and machine learning. We begin by showing that standard approaches such as sample average approximation and robust (or averaged) stochastic approximation can lead to suboptimal -- and in some cases arbitrarily poor -- performance with realistic finite sample sizes. In contrast, we demonstrate that a carefully designed variance reduction strategy, which we term VISOR for short, can significantly outperform these approaches while using the same sample size. Our upper bounds are complemented by finite-sample, information-theoretic local minimax lower bounds, which highlight fundamental, instance-dependent factors that govern the performance of any estimator. Taken together, these results demonstrate that an accelerated variant of VISOR is instance-optimal, achieving the best possible sample complexity up to logarithmic factors while also attaining optimal oracle complexity. We apply our theory to generalized linear models and improve upon classical results. In particular, we obtain the best-known non-asymptotic, instance-dependent generalization error bounds for stochastic methods, even in linear regression.


Algorithmic warm starts for Hamiltonian Monte Carlo

Zhang, Matthew S., Altschuler, Jason M., Chewi, Sinho

arXiv.org Machine Learning

Generating samples from a continuous probability density is a central algorithmic problem across statistics, engineering, and the sciences. For high-dimensional settings, Hamiltonian Monte Carlo (HMC) is the default algorithm across mainstream software packages. However, despite the extensive line of work on HMC and its widespread empirical success, it remains unclear how many iterations of HMC are required as a function of the dimension $d$. On one hand, a variety of results show that Metropolized HMC converges in $O(d^{1/4})$ iterations from a warm start close to stationarity. On the other hand, Metropolized HMC is significantly slower without a warm start, e.g., requiring $Ω(d^{1/2})$ iterations even for simple target distributions such as isotropic Gaussians. Finding a warm start is therefore the computational bottleneck for HMC. We resolve this issue for the well-studied setting of sampling from a probability distribution satisfying strong log-concavity (or isoperimetry) and third-order derivative bounds. We prove that \emph{non-Metropolized} HMC generates a warm start in $\tilde{O}(d^{1/4})$ iterations, after which we can exploit the warm start using Metropolized HMC. Our final complexity of $\tilde{O}(d^{1/4})$ is the fastest algorithm for high-accuracy sampling under these assumptions, improving over the prior best of $\tilde{O}(d^{1/2})$. This closes the long line of work on the dimensional complexity of MHMC for such settings, and also provides a simple warm-start prescription for practical implementations.


Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Weill, Nathan, Wang, Kaizheng

arXiv.org Machine Learning

We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.


Sparse Bayesian Deep Functional Learning with Structured Region Selection

Zhu, Xiaoxian, Li, Yingmeng, Ma, Shuangge, Wu, Mengyun

arXiv.org Machine Learning

In modern applications such as ECG monitoring, neuroimaging, wearable sensing, and industrial equipment diagnostics, complex and continuously structured data are ubiquitous, presenting both challenges and opportunities for functional data analysis. However, existing methods face a critical trade-off: conventional functional models are limited by linearity, whereas deep learning approaches lack interpretable region selection for sparse effects. To bridge these gaps, we propose a sparse Bayesian functional deep neural network (sBayFDNN). It learns adaptive functional embeddings through a deep Bayesian architecture to capture complex nonlinear relationships, while a structured prior enables interpretable, region-wise selection of influential domains with quantified uncertainty. Theoretically, we establish rigorous approximation error bounds, posterior consistency, and region selection consistency. These results provide the first theoretical guarantees for a Bayesian deep functional model, ensuring its reliability and statistical rigor. Empirically, comprehensive simulations and real-world studies confirm the effectiveness and superiority of sBayFDNN. Crucially, sBayFDNN excels in recognizing intricate dependencies for accurate predictions and more precisely identifies functionally meaningful regions, capabilities fundamentally beyond existing approaches.